The EPHA2 gene is associated with cataracts linked to chromosome 1p.

PURPOSE
Cataracts are a clinically and genetically heterogeneous disorder affecting the ocular lens, and the leading cause of treatable vision loss and blindness worldwide. Here we identify a novel gene linked with a rare autosomal dominant form of childhood cataracts segregating in a four generation pedigree, and further show that this gene is likely associated with much more common forms of age-related cataracts in a case-control cohort.


METHODS
Genomic DNA was prepared from blood leukocytes, and genotyping was performed by means of single nucleotide polymorphism (SNP) markers, and short tandem repeat (STR) markers. Linkage analyses were performed with the GeneHunter and MLINK programs, and association analyses were performed with the Haploview and Exemplar programs. Mutation detection was achieved by PCR amplification of exons and di-deoxy cycle-sequencing.


RESULTS
Genome-wide linkage analysis with SNP markers, identified a likely disease-haplotype interval on chromosome 1p (rs707455-[approximately 10 Mb]-rs477558). Linkage to chromosome 1p was confirmed using STR markers D1S2672 (LOD score [Z]=3.56, recombination distance [theta]=0), and D1S2697 (Z=2.92, theta=0). Mutation profiling of positional-candidate genes detected a heterozygous transversion (c.2842G>T) in exon 17 of the gene coding for Eph-receptor type-A2 (EPHA2) that cosegregated with the disease. This missense change was predicted to result in the non-conservative substitution of a tryptophan residue for a phylogenetically conserved glycine residue at codon 948 (p.G948W), within a conserved cytoplasmic domain of the receptor. Candidate gene association analysis further identified SNPs in the EPHA2 region of chromosome 1p that were suggestively associated with age-related cataracts (p=0.007 for cortical cataracts, and p=0.01 for cortical and/or nuclear cataracts).


CONCLUSIONS
These data provide the first evidence that EPHA2, which functions in the Eph-ephrin bidirectional signaling pathway of mammalian cells, plays a vital role in maintaining lens transparency.

In addition to conventional linkage studies of inherited cataracts in extended families, sibling and twin studies indicate that genetic risk factors may account for 14%-48% of the heritability for age-related "nuclear" cataracts [26][27][28], and 24%-75% of the heritability for age-related "cortical" SNP genotyping and linkage analysis: For genome-wide linkage analysis, genotyping was performed by means of the HumanLinkage-12 Genotyping Beadchip and the Infinium-II whole-genome amplification and single-base extension assay (Illumina, San Diego CA) in the Microarray Core Facility at Washington University Genome Center. Parametric multipoint linkage analysis performed with GeneHunter version 2.1r5 from the easyLINKAGE Plus version 5.08 package [37]. SNP marker allele frequencies used for linkage analysis were those calculated for Caucasians by the HapMap project. A gene frequency of 0.001 and a penetrance of 100% were assumed for the disease locus. Computation was performed with GeneHunter in sets of 100 markers. STR genotyping and linkage analysis: STR markers from the National Center for Biotechnology Information (NCBI) combined Généthon, Marshfield, and deCODE genetic linkage maps were genotyped by means of a 4200 DNA analyzer running Gene ImagIR software (Li-Cor, Lincoln, NE) as described previously [38]. Pedigree and haplotype data were managed using Cyrillic (v.2.1) software (FamilyGenetix Ltd., Reading, UK), and two-point LOD scores (Z) calculated using the MLINK sub-program from the LINKAGE (5.1) package of programs [39]. Marker allele frequencies were assumed to be equal, and a gene frequency of 0.0001 with a penetrance of 100% were assumed for the disease locus. Mutation analysis: The genomic sequence for EPHA2 was obtained from the Ensembl human genome browser, and gene-specific M13-tailed PCR primers (Table 1) were selected from the NCBI re-sequencing amplicon (RSA) probe database. Genomic DNA (2.5 ng/ul, 20 ul reactions), was amplified (35-40 cycles) in a GeneAmp 9700 thermal cycler using AmpliTaq polymerase (Applied Biosystems, Foster City, CA) and gene-specific primers (10 pmol). Resulting PCR amplicons (~250-650 bp) were either enzyme-purified with ExoSAP-IT (USB Corporation, Cleveland, OH) or gelpurified with the QIAquick gel-extraction kit (Qiagen). Purified amplicons were direct cycle-sequenced in both directions with BigDye Terminator Ready Reaction Mix (version 3.1) containing M13 forward or reverse sequencing primers then ethanol precipitated and detected by capillary electrophoresis on a 3130xl Genetic Analyzer running Sequence Analysis (version 5.2) software (Applied Biosystems), and Chromas (version 2.23) software (Technelysium, Tewantin, Queensland, Australia). For allelespecific PCR analysis, exon-17 was amplified with three primers (Ex17R1, Ex17SF, and T-alleleF; Table 1), and resulting amplicons were visualized at 302 nm following electrophoresis in 3% agarose-gels stained with GelRed (Biotium, Hayward, CA).
ratios were calculated using the Exemplar program version 4.04 (Sapio Sciences, York, PA). Hardy-Weinberg equilibrium was assessed using a χ 2 test implemented in the Exemplar program.

Linkage studies:
We investigated a four generation white family from the United States (family Mu) segregating "posterior polar" cataracts in the absence of systemic abnormalities ( Figure 1A). Autosomal dominant inheritance was supported by father-to-son transmission in the absence of gender bias or skipping of generations. Ophthalmic records indicated that the cataracts usually presented in both eyes as disc-shaped posterior sub-capsular opacities with evidence of posterior lenticonus ( Figure 1B). In three affected individuals, opacification progressed to affect the central (nucleus) and anterior polar regions of the lens (IV:5, IV:6, IV:7; Figure 1A). In addition to cataracts, one affected individual had monocular amblyopia (III:8), and two others developed strabismus requiring corrective surgery (IV:1, IV:5). The age-atdiagnosis varied from birth to15 years, and the age-at-surgery ranged from 0 to 44 years. Post-surgical corrected visual acuity varied from 20/20 to 20/70 in the better eye.  (Table 2).
Mutation analysis: The SNP and STR intervals contained 211 and 219 positional-candidate genes, respectively, none of which were obvious functional candidates for cataracts in family Mu (NCBI Map Viewer). We prioritized genes for mutation analysis of exons and intron boundaries (splicesites) primarily based on evidence of expression or function in the eye, from the NCBI UniGene expressed sequence tag (EST) database, and the NEIBank bioinformatics resource for vision research [41]. Re-sequencing analysis of individuals II: 1, III:8, and III:9 from the Mu pedigree ( Figure 1A) excluded the presence of coding or splice-site mutations in several genes including ENO1 (GeneID: 2023), which according to NEIBank is abundantly expressed in the human lens. However, re-sequencing of a 17-exon gene symbolized EPHA2 (GeneID: 1969), which is also expressed in human lens (NEIBank), identified a heterozygous c.2842G>T The disease associated allele is shown for each SNP in the cataract interval, which spans ~10 megabases (Mb) or ~19 centi-Morgans (cM). The asterisk denotes critical recombinant individual III:8 in family Mu ( Figure 1A) with genotype T/T at rs707455. The double asterisk denotes critical recombinant individual IV:6 ( Figure 1A) with genotype A/A at rs477558. transversion in exon-17 that was not present in wild type ( Figure 3B). This single nucleotide change did not result in the gain or loss of a convenient restriction site, therefore we designed allele-specific (G/T) PCR analysis to confirm that the mutant "T" allele cosegregated with affected but not unaffected members of family Mu ( Figure 3C). Furthermore, when we tested the c.2842G>T transversion as a bi-allelic marker, with a notional allelic frequency of 1%, in a two-point LOD score analysis of the cataract locus (Table 3) we obtained further compelling evidence of linkage (Z=3.61, θ=0). In addition, we confirmed that the c.2842G>T transversion was not listed in the NCBI SNP database (dbSNP), and excluded it as a SNP in a panel of 192 normal unrelated individuals (i.e., 384 chromosomes) using the allele-specific PCR analysis described in Figure 3C (data not shown). While it is possible that an undetected mutation lay elsewhere within the disease-haplotype interval, our genotype data strongly suggested that the c.2842G>T transversion in exon 17 of EPHA2 represented a causative mutation rather than a benign SNP in linkage disequilibrium with the cataract phenotype. The c.2842G>T transversion identified above occurred at the first base of codon 948 (GGG >TGG), and was predicted to result in the missense substitution of glycine to tryptophan (p.G948W) at the level of translation (Figure 4), placing it in the cytoplasmic sterile-α-motif (SAM) domain of EPHA2 [42,43]. Cross-species alignment of the amino acid sequences for EPHA2 present in the Entrez protein database, performed by means of ClustalW multiple sequence alignment, revealed that p.G948 is evolutionarily conserved within the bony vertebrates ( Figure 4B). Moreover, the predicted p.G948W substitution represented a nonconservative amino acid change, with the small neutral, polar side-group (-H) of  [45][46][47], and a locus (blue) for age-related cortical cataracts [44]. M, mega-base pairs. glycine replaced by the much larger neutral, hydrophobic side-group (-CH2-C8 H4NH) of tryptophan. Position specific score matrix analysis (PSSM Viewer) revealed a marked decline in value from +6 to −6 (Table 4) confirming that the predicted p.G948W substitution occurred less frequently than expected in proteins with the conserved SAM domain (Conserved Domain Database [CDD] pfam00536) further raising the likelihood of functional consequences.
Association studies: EPHA2 resides within a large interval (D1S468-[~25.5 Mb]-D1S1622) on chromosome 1p ( Figure  1C) that was previously linked with age-related cortical cataracts in families and sib-pairs from the Beaver Dam Eye Study [44]. To investigate the possibility that EPHA2 was associated with age-related cataracts, we performed candidate-gene SNP allele association studies in a casecontrol cohort from Northern Italy [34,35]. This clinically well defined cohort was of similar European ancestry to that   (Table 5). Of the cases, 28 had both nuclear and cortical cataracts, three had nuclear and posterior sub-capsular cataracts, 10 had cortical and posterior sub-capsular cataracts, and four had nuclear, cortical, and posterior sub-capsular cataracts.
In an initial screen, tagging SNPs covering the EPHA2 region were genotyped using 100 samples each from individuals with any nuclear, any cortical and no age-related cataracts. Markers yielding p values <0.05 were further analyzed using the entire complement of samples (Appendix 1, Figure 5). All markers were in Hardy-Weinberg equilibrium (HWE, p>0.05) for all sample groups tested. The highest levels of association occurred with rs7543472 (allelic p<0.038 for cortical cataracts, and allelic p<0.021 for any age related cataracts) and rs11260867 (allelic p<0.020 for cortical cataracts), which lie just distal (3′) to EPHA2 ( Figure 5). However, there was no significant association with nuclear cataracts, although results with rs7543472 were suggestive (allelic p<0.071, Appendix 1). These findings were true whether the analysis was for pure cortical cataracts, any cortical cataracts, pure nuclear cataracts, or any nuclear cataracts. Since association was seen between SNPs in the EPHA2 region and cortical cataracts and suggestive results were obtained with nuclear cataracts, the analysis was extended to include any age-related cataracts with subsequent increased association in SNPs from the peak region to p<0.021 with the T-allele of rs7543472, although the G-allele of rs11260867 actually lost significance slightly (p=0.076). Results for SNPs outside this region changed only minimally. Because the SNPs showing the greatest association lay over or just beyond exon 17, this region was sequenced in all samples. The only variation occurring with a frequency greater than 0.05 was rs3754334, a synonymous p.I958I (C3011T) variant. This variant had minor allele frequencies of ~34% in cataract cases and 28% in controls, and was thus analyzed further, although the association was not as strong as with the two distal (3′) SNPs (rs7543472 or rs11260867) and no significant allelic p values were obtained. Allelic odds ratios ranged from 1.5 to 1.7, for rs7543472, and 1.2-1.9 for rs11260867, but were not significant for rs3754334. The p values obtained were not corrected for multiple testing as these SNPs were tested, a priori, for association with agerelated cataracts on the basis of their close proximity to a gene linked with inherited cataracts that also maps within a known linkage region for age-related cataracts, rather than as part of a genome-wide scan.
Trend p values were very similar to allelic p values for all SNPs, but the specific genotypes show greater levels of association for all SNPs showing significant association (Appendix 1). The most significant was p<0.007 for cortical cataracts with rs11260867 (GG genotype), p<0.012 for cortical cataracts and p<0.01 for any cataracts with rs7543472 (TT genotype), and p<0.019 for cortical cataracts and p<0.028 for any cataracts with rs3754334 (TT genotype). In addition, association of nuclear cataracts with the TT genotype of rs7543472 is marginally significant at p<0.042 and suggestive with the TT genotype of rs3754334 at p<0.06. Genotypic odds ratios show the greatest risk increase in individuals  Differences in the physico-chemical properties of glycine and tryptophan are consistent with a non-conservative substitution. The negative PSSM value indicates that tryptophan substitutes for glycine at position 948 less frequently than expected among proteins containing the conserved SAM domain.
homozygous for the risk alleles, ranging from 1.9 to 2.1 for rs7543472, 1.5-2.3 for rs11260867, and 2.7-3.3 for rs3754334, depending on the type of age-related cataracts being considered (Appendix 1). The association of a potential risk haplotype for these three SNPs was examined; however, the association level was found to be essentially that of the least associated SNP. This result was consistent with the observation that all but 4 affected individuals with the homozygous TT genotype for rs3754334 also had a homozygous TT genotype for rs7543472, and all but one had a homozygous GG genotype for rs11260867. Similarly, of affected individuals with a homozygous TT genotype for rs7543472 only a single individual did not have a homozygous GG genotype for rs11260867. However, because the genotype frequency of TT for rs3754334 was only ~15% it did not show strong linkage disequilibrium with rs11260867 (D´=0.018) or rs7543472 (D =0.022). In contrast, rs11260867 and rs7543472, for which the risk genotypes have frequencies of ~80% and ~73% respectively, showed strong linkage disequilibrium (D´=0.896).

DISCUSSION
In this study, we have mapped an interval on human chromosome 1p36 for autosomal dominant posterior polar cataracts, and identified an underlying missense mutation (p.G948W) in the gene coding for EPHA2, which has not previously been associated with eye disease. Our disease interval maps in close proximity to autosomal dominant loci for the Volkman congenital cataract (CCV) in a Danish family [45], posterior polar cataract (CCTP1) in a British family [46], and total congenital cataract in a Tasmanian family [47], raising the possibility of allelism ( Figure 1C). In addition, EPHA2 resides within a large interval (D1S468-[~25.5 Mb]-D1S1622) on chromosome 1p ( Figure 1C) that was previously linked with age-related cortical cataracts in families and sib-pairs from Beaver Dam, Wisconsin [44]. Here, we further demonstrate that variations in the EPHA2 region are associated with age-related cortical cataracts (p=0.007), and age-related cataracts overall (p=0.01) in an Italian population.
EPHA2 encodes a 976 amino acid, type-1 transmembrane protein (~108 kDa) with extracellular NH2-terminal and cytoplasmic COOH-terminal halves ( Figure 4B) [42]. The extracellular region comprises a conserved Eph-ligand binding domain followed by a cysteine-rich EGF-like domain, and two fibronectin type-III repeats, whereas, the cytoplasmic region contains a conserved tyrosine kinase domain, and a sterile-α-motif (SAM) domain [42]. The proposed p.G948W missense substitution, cosegregating with cataracts in family Mu, was predicted to reside in the cytoplasmic SAM domain of EPHA2. The evolutionarily conserved ~70 amino acid SAM domain has a compact helical structure that is believed to facilitate protein-protein interactions [43], raising the possibility that the predicted p.G948W substitution may interfere with Eph-receptor oligomerization and clustering into higher-order complexes essential for physiologic signaling [42].
In contrast to the heterozygous genotype and dominant phenotype linked with the p.G948W mutation in EPHA2, the SNPs in the EPHA2 region most associated with age-related cataracts displayed the highest odds ratios when homozygous for the risk allele, consistent with recessive effects. Furthermore, the p.G948W mutation was linked with posterior polar cataracts, which most resemble age-related posterior sub-capsular cataracts, whereas, EPHA2 SNPs were associated with much more prevalent forms of age-related cataracts, in decreasing order of significance; cortical opacities > any opacities > nuclear opacities. While it is problematic to compare inherited and age-related forms of cataracts, it is noteworthy that some of the affected members of family Mu also developed nuclear and anterior polar opacities. These observations raise the possibility that EPHA2 variants may contribute to a common pathogenic mechanism resulting in different cataract phenotypes. The existence of any such mechanism remains to be determined; however, it is interesting that the two SNPs in the EPHA2 region most associated with age-related cataracts (rs7543472 and The cohort of unrelated individuals comprised a total of 213 age-related cataract cases and 104 clear lens controls, matched for age and gender. Cases with both nuclear and cortical cataracts (32) were counted in both the any cortical and any nuclear groups. Cases with any posterior sub-capsular cataracts (17) were not analyzed separately due to statistically insufficient numbers.
rs11260867) lie outside the coding region near the 3′-end of the gene, which has been shown to harbor highly conserved translational control sequences that are believed to facilitate an RNA-based post-transcriptional mechanism for localized regulation of gene expression within cells [48,49]. EPHA2 is a member of the largest sub-family of receptor tyrosine kinases, which are divided into 2 classes; the Asubclass comprising 10 receptor genes (EPHA1-10) and the B-sub-class with 6 receptor genes (EPHB1-6) [50,51]. Eph receptors interact with their cognate membrane-anchored ligands, referred to as ephrins (EFNA1-6,EFNB1-3), at cell-cell contact sites to activate bidirectional signaling pathways effecting diverse physiologic processes including; cell adhesion, repulsion, morphology, and migration [50,51]. Intriguingly, EPHA2 has been implicated in tumor angiogenesis and neurite outgrowth; however, little is known about its role in the development and homeostasis of the avascular, noninnervated lens. Recently, however, it has been reported that mice lacking ephrin-A5, a ligand of Epha2, develop cataracts as a result of impaired lens fiber cell adhesion [52]. Further characterization of Eph-ephrin signaling pathways in the lens should provide insight into the Figure 5. Graphical presentation of association between SNPs in the EPHA2 region of chromosome 1p with age-related cataracts in the Italian casecontrol cohort. The upper panel shows a plot of -log p-values (y-axis) from association analyses of 21 SNPs across the EPHA2 region. Blue diamonds denote -log p values for pure cortical cataracts; red squares, any cortical cataracts; green triangles, pure nuclear cataracts; purple x, any nuclear cataracts; and light blue asterisks, any cataracts. The x-axis shows the relative physical location of each SNP measured in mega-base-pairs. The lower panel shows a pairwise linkage disequilibrium (D´) Haploview plot for SNPs in the EPHA2 region. The strength of linkage disequilibrium (LD) is color-coded; red indicates strong LD with SNPs showing high correlation, and blue indicates low LD and high recombination. The relative positions of EPHA2 and the adjacent FAM131C are indicated with haplotype blocks for the European population (CEU) from the HapMap project (SNPbrowser, Applied Biosystems). pathogenetic mechanisms linking EPHA2 dysfunction with cataractogenesis.
In conclusion, our data have associated EPHA2 with both inherited and age-related cataracts, and suggest that dysfunction of a member of the EphA-receptor tyrosine kinase sub-family triggers loss of lens transparency.

ACKNOWLEDGMENTS
We thank family and case-control members for participating in this study, Dr. Olivera Boskovska for help with ascertaining family Mu, Dr. Donna Mackay for preliminary linkage analysis, Raffaella Aldigeri and Francesca Grassi for DNA preparation from the Italian case-control cohort, and the Microarray Core Facility at Washington University Genome Center, St. Louis, MO for SNP genotyping. This work was supported by NIH/NEI grants EY012284 (to A.S.) and EY02687, and FIL 2002-2003 grants (to G.M.).